Analysis and Implementation of Modified K-medoids Algorithm to Increase Scalability and Efficiency for Large Dataset

نویسندگان

  • Gopi Gandhi
  • Rohit Srivastava
چکیده

Clustering plays a vital role in research area in the field of data mining. Clustering is a process of partitioning a set of data in a meaningful sub classes called clusters. It helps users to understand the natural grouping of cluster from the data set. It is unsupervised classification that means it has no predefined classes. Applications of cluster analysis are Economic Science, Document classification, Pattern Recognition, Image Processing, text mining. Hence, in this study some algorithms are presented which can be used according to one’s requirement. K-means is the most popular algorithm used for the purpose of data segmentation. K-means is not very effective in many cases. Also it is not even applicable for data segmentation in some specific kinds of matrices like Absolute Pearson. Whereas K-Medoids is considered flexible than k-means and also carry compatibility to work with almost every type of data matrix. The medoid computed using k-Medoids algorithm is roughly comparable to the median. After checking the literature on median, we have found a number of advantages of median over arithmetic mean. In this paper, we have used a modified version of k-medoids algorithm for the large data sets. Proposed k-medoids algorithm has been modified to perform faster than k-means because speed is the major cause behind the k-medoids unpopularity as compared to kmeans. Our experimental results have shown that improved k-medoid performed better than k-means and k-medoid in terms of cluster quality and elapsed time

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Application of modified balanced iterative reducing and clustering using hierarchies algorithm in parceling of brain performance using fMRI data

Introduction: Clustering of human brain is a very useful tool for diagnosis, treatment, and tracking of brain tumors. There are several methods in this category in order to do this. In this study, modified balanced iterative reducing and clustering using hierarchies (m-BIRCH) was introduced for brain activation clustering. This algorithm has an appropriate speed and good scalability in dealing ...

متن کامل

Intrusion Detection based on a Novel Hybrid Learning Approach

Information security and Intrusion Detection System (IDS) plays a critical role in the Internet. IDS is an essential tool for detecting different kinds of attacks in a network and maintaining data integrity, confidentiality and system availability against possible threats. In this paper, a hybrid approach towards achieving high performance is proposed. In fact, the important goal of this paper ...

متن کامل

خوشه‌بندی داده‌ها بر پایه شناسایی کلید

Clustering has been one of the main building blocks in the fields of machine learning and computer vision. Given a pair-wise distance measure, it is challenging to find a proper way to identify a subset of representative exemplars and its associated cluster structures. Recent trend on big data analysis poses a more demanding requirement on new clustering algorithm to be both scalable and accura...

متن کامل

Clustering of nasopharyngeal carcinoma intensity modulated radiation therapy plans based on k-means algorithm and geometrical features

Background: The design of intensity modulated radiation therapy (IMRT) plans is difficult and time-consuming. The retrieval of similar IMRT plans from the IMRT plan dataset can effectively improve the quality and efficiency of IMRT plans and automate the design of IMRT planning. However, the large IMRT plans datasets will bring inefficient retrieval result. Materials and Methods: An intensity-m...

متن کامل

Parallel K-Medoids++ Spatial Clustering Algorithm Based on MapReduce

Clustering analysis has received considerable attention in spatial data mining for several years. With the rapid development of the geospatial information technologies, the size of spatial information data is growing exponentially which makes clustering massive spatial data a challenging task. In order to improve the efficiency of spatial clustering for large scale data, many researchers propos...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014